Take-home Exercise 1: Understanding and Analysing Thailand Road Accident Data

Published

September 5, 2024

Modified

September 22, 2024

1 Thailand’s Killer Roads

Road traffic injuries pose a significant public health issue in Thailand, with a high number of fatalities, injuries, and disabilities each year. These incidents have a profound impact not only on the victims and their families but also on society and the nation as a whole. According to the World Health Organization (WHO), approximately 20,000 people lose their lives in road accidents annually, equating to about 56 deaths each day.

Despite numerous government initiatives to reduce road casualties, the situation has seen minimal improvement. These issues underscore the need for a deeper understanding of road traffic accidents, which can be largely attributed to two primary factors: behavioral and environmental.

Behavioral factors, such as driver behavior and performance, are significant contributors to traffic accidents. These factors include driving style and skills, as well as direct and indirect driving behaviors. Environmental factors encompass conditions such as poor visibility during adverse weather (e.g., heavy rain or fog) and challenging road features (e.g., sharp bends, slippery slopes, and blind spots).

Previous studies have highlighted the value of Spatial Point Patterns Analysis in examining road traffic accidents. However, these studies often focus solely on either behavioral or environmental factors, neglecting the influence of temporal factors such as season, day of the week, or time of day.

In light of this, it is essential to investigate the factors affecting road traffic accidents in the Bangkok Metropolitan Region (BMR) using both spatial and spatio-temporal point patterns analysis methods. This approach aims to identify the leading causes of accidents and provide insights that can guide the development of effective policies and interventions to enhance road safety.

2 Getting Started

2.1 Loading Packages

In this exercise, we will be using the following packages:

Package Description
sf For importing, managing, and handling geospatial data
spatstat For point pattern analysis. In this hands-on exercise, it will be used to perform 1st- and 2nd-order spatial point patterns analysis and derive kernel density estimation layer
sfdep Used to compute spatial weights
tmap For thematic mapping
leaflet For interactive maps
raster For raster data manipulation functions
spNetwork For spatial analysis on networks
tidyverse For non-spatial data wrangling
DT, knitr, kableExtra For building tables
gtsummary For publication-ready analyticla and summary tables

The following code chunk uses p_load() of pacman package to check if the aforementioned packages are installed in the computer. If they are, the libraries will be called into R.

pacman::p_load(sf, sfdep, 
               spatstat,
               tmap, leaflet, 
               raster,
               spNetwork,
               tidyverse,
               DT, knitr, kableExtra, gtsummary)

2.2 The Data

File Source Screenshot

Thailand - SubnationalAdministrativeBoundaries, in SHP file format

There are 3 administrative levels in the shapefile: 0 (country), 1 (province), 2 (district), and 3 (sub-district, tambon) boundaries. We will use level 1 to filter for the Bangkok Metropolitan Region.

Humanitarian Data Exchange
Thailand Roads (OpenStreetMap Export), in SHP file format Humanitarian Data Exchange

Thailand Road Accident [2019-2022], in CSV format

This dataset provides a comprehensive statistics on recorded road accidents in Thailand from 2019 to 2022, including time location, accident type, weather condition, and road characteristics.

Kaggle

2.2.1 Traffic Accident Data

read_csv() of the readr package allows us to import csv files into R Studio.

accidents <- read_csv("data/geospatial/thai_road_accident_2019_2022.csv")
glimpse(accidents)
Rows: 81,735
Columns: 18
$ acc_code                    <dbl> 571905, 3790870, 599075, 571924, 599523, 5…
$ incident_datetime           <dttm> 2019-01-01 00:00:00, 2019-01-01 00:03:00,…
$ report_datetime             <dttm> 2019-01-02 06:11:00, 2020-02-20 13:48:00,…
$ province_th                 <chr> "ลพบุรี", "อุบลราชธานี", "ประจวบคีรีขันธ์", "เชียงใ…
$ province_en                 <chr> "Loburi", "Ubon Ratchathani", "Prachuap Kh…
$ agency                      <chr> "department of rural roads", "department o…
$ route                       <chr> "แยกทางหลวงหมายเลข 21 (กม.ที่ 31+000) - บ้านวั…
$ vehicle_type                <chr> "motorcycle", "private/passenger car", "mo…
$ presumed_cause              <chr> "driving under the influence of alcohol", …
$ accident_type               <chr> "other", "rollover/fallen on straight road…
$ number_of_vehicles_involved <dbl> 1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, …
$ number_of_fatalities        <dbl> 0, 0, 1, 0, 0, 0, 0, 1, 3, 0, 0, 1, 0, 0, …
$ number_of_injuries          <dbl> 2, 2, 0, 1, 0, 2, 2, 0, 0, 1, 1, 0, 1, 1, …
$ weather_condition           <chr> "clear", "clear", "clear", "clear", "clear…
$ latitude                    <dbl> 14.959105, 15.210738, 12.374259, 18.601721…
$ longitude                   <dbl> 100.87346, 104.86269, 99.90795, 98.80420, …
$ road_description            <chr> "straight road", "straight road", "wide cu…
$ slope_description           <chr> "no slope", "no slope", "slope area", "no …

A quick look into the data using glimpse() of the dplyr package reveals that there are 18 variables in the data, including:

Column Name Description
acc_code The accident code or identifier.
incident_datetime The date and time of the accident occurrence.
report_datetime The date and time when the accident was reported.
province_th The name of the province in Thailand, written in Thai.
province_en The name of the province in Thailand, written in English.
agency The government agency responsible for the road and traffic management.
route The route or road segment where the accident occurred.
vehicle_type The type of vehicle involved in the accident.
presumed_cause The presumed cause or reason for the accident.
accident_type The type of nature of the accident.
number_of_vehicles_involved The number of vehicles involved in the accident.
number_of_fatalities The number of fatalities resulting from the accident.
number_of_injuries The number of injuries resulting from the accident.
weather_condition The weather condition at the time of the accident.
latitude The latitude coordinate of the accident location.
longitude The longitude coordinate of the accident location.
road_description The description of the road type or configuration where the accident occurred.
slope_description The description of the slope condition at the accident location.

Since the province_th and route columns are in Thai, I will remove them. The province column is already available in English, and the route information can be projected using the latitude and longitude columns. Additionally, I will remove the report_datetime column and use incident_datetime for our analysis.

accidents <- accidents %>% 
  select(-c(province_th, 
            route, 
            report_datetime))

The gtsummary package provides us with descriptive statistics of the dataset and also includes the amount of missingness in each variable.

Table of Variable Summary
Characteristic N N = 81,7351
acc_code 81,735 3,824,084 (3,789,460, 5,831,089)
incident_datetime 81,735 2019-01-01 to 2022-12-31 23:55:00
province_en 81,735
    Amnat Charoen
196 (0.2%)
    Ang Thong
1,125 (1.4%)
    Bangkok
6,439 (7.9%)
    buogkan
583 (0.7%)
    Buri Ram
327 (0.4%)
    Chachoengsao
1,407 (1.7%)
    Chai Nat
692 (0.8%)
    Chaiyaphum
471 (0.6%)
    Chanthaburi
1,360 (1.7%)
    Chiang Mai
3,407 (4.2%)
    Chiang Rai
858 (1.0%)
    Chon Buri
4,120 (5.0%)
    Chumphon
1,039 (1.3%)
    Kalasin
659 (0.8%)
    Kamphaeng Phet
822 (1.0%)
    Kanchanaburi
1,165 (1.4%)
    Khon Kaen
1,224 (1.5%)
    Krabi
741 (0.9%)
    Lampang
1,018 (1.2%)
    Lamphun
744 (0.9%)
    Loburi
1,036 (1.3%)
    Loei
742 (0.9%)
    Mae Hong Son
308 (0.4%)
    Maha Sarakham
832 (1.0%)
    Mukdahan
668 (0.8%)
    Nakhon Nayok
386 (0.5%)
    Nakhon Pathom
891 (1.1%)
    Nakhon Phanom
462 (0.6%)
    Nakhon Ratchasima
3,330 (4.1%)
    Nakhon Sawan
1,403 (1.7%)
    Nakhon Si Thammarat
1,363 (1.7%)
    Nan
931 (1.1%)
    Narathiwat
477 (0.6%)
    Nong Bua Lam Phu
288 (0.4%)
    Nong Khai
297 (0.4%)
    Nonthaburi
827 (1.0%)
    Pathum Thani
1,923 (2.4%)
    Pattani
395 (0.5%)
    Phangnga
328 (0.4%)
    Phatthalung
1,068 (1.3%)
    Phayao
543 (0.7%)
    Phetchabun
1,704 (2.1%)
    Phetchaburi
873 (1.1%)
    Phichit
226 (0.3%)
    Phitsanulok
800 (1.0%)
    Phra Nakhon Si Ayutthaya
1,571 (1.9%)
    Phrae
1,466 (1.8%)
    Phuket
444 (0.5%)
    Prachin Buri
809 (1.0%)
    Prachuap Khiri Khan
1,141 (1.4%)
    Ranong
140 (0.2%)
    Ratchaburi
497 (0.6%)
    Rayong
770 (0.9%)
    Roi Et
721 (0.9%)
    Sa Kaeo
699 (0.9%)
    Sakon Nakhon
973 (1.2%)
    Samut Prakan
2,241 (2.7%)
    Samut Sakhon
1,015 (1.2%)
    Samut Songkhram
524 (0.6%)
    Saraburi
1,158 (1.4%)
    Satun
321 (0.4%)
    Si Sa Ket
739 (0.9%)
    Sing Buri
448 (0.5%)
    Songkhla
1,231 (1.5%)
    Sukhothai
1,305 (1.6%)
    Suphan Buri
3,056 (3.7%)
    Surat Thani
1,983 (2.4%)
    Surin
843 (1.0%)
    Tak
1,859 (2.3%)
    Trang
784 (1.0%)
    Trat
469 (0.6%)
    Ubon Ratchathani
751 (0.9%)
    Udon Thani
946 (1.2%)
    unknown
34 (<0.1%)
    Uthai Thani
503 (0.6%)
    Uttaradit
949 (1.2%)
    Yala
343 (0.4%)
    Yasothon
504 (0.6%)
agency 81,735
    department of highways
75,304 (92%)
    department of rural roads
5,115 (6.3%)
    expressway authority of thailand
1,316 (1.6%)
vehicle_type 81,735
    4-wheel pickup truck
28,445 (35%)
    6-wheel truck
3,019 (3.7%)
    7-10-wheel truck
2,364 (2.9%)
    bicycle
257 (0.3%)
    large passenger vehicle
421 (0.5%)
    large truck with trailer
6,257 (7.7%)
    motorcycle
14,874 (18%)
    motorized tricycle
294 (0.4%)
    other
3,513 (4.3%)
    passenger pickup truck
325 (0.4%)
    pedestrian
219 (0.3%)
    private/passenger car
20,814 (25%)
    three-wheeled vehicle
28 (<0.1%)
    tractor/agricultural vehicle
63 (<0.1%)
    van
842 (1.0%)
presumed_cause 81,735
    abrupt lane change
134 (0.2%)
    aggressive driving/overtaking
12 (<0.1%)
    brake/anti-lock brake system failure
55 (<0.1%)
    cutting in closely by people/vehicles/animals
6,724 (8.2%)
    dangerous curve
61 (<0.1%)
    debris/obstruction on the road
294 (0.4%)
    disabled vehicle without proper signals
37 (<0.1%)
    disabled vehicle without proper signals/signs
5 (<0.1%)
    driving in the wrong lane
37 (<0.1%)
    driving under the influence of alcohol
1,464 (1.8%)
    driving without headlights/illumination
20 (<0.1%)
    external disturbance
2 (<0.1%)
    failure to signal enter/exit parking
43 (<0.1%)
    failure to yield right of way
133 (0.2%)
    failure to yield/signal
215 (0.3%)
    falling asleep
4,700 (5.8%)
    ignoring stop sign while leaving intersection
79 (<0.1%)
    illegal overtaking
445 (0.5%)
    inadequate visibility
13 (<0.1%)
    insufficient light
69 (<0.1%)
    internal disturbance
1 (<0.1%)
    loss of control
49 (<0.1%)
    medical condition
40 (<0.1%)
    narrow road
10 (<0.1%)
    navigation equipment failure
1 (<0.1%)
    no presumed cause related to driver
1 (<0.1%)
    no presumed cause related to road conditions
4 (<0.1%)
    no presumed cause related to vehicle conditions
1 (<0.1%)
    no road divider lines
1 (<0.1%)
    no traffic light system
1 (<0.1%)
    no traffic signs
9 (<0.1%)
    obstruction in sight
7 (<0.1%)
    other
1,576 (1.9%)
    overloaded vehicle
176 (0.2%)
    repair/construction on the road
6 (<0.1%)
    reversing vehicle
70 (<0.1%)
    road in poor condition
7 (<0.1%)
    running red lights/traffic signals
911 (1.1%)
    slippery road
85 (0.1%)
    speeding
60,373 (74%)
    straddling lanes
31 (<0.1%)
    sudden stop
48 (<0.1%)
    tailgating
181 (0.2%)
    traffic light system failure
3 (<0.1%)
    turn signal system failure
2 (<0.1%)
    unfamiliarity with the route/unskilled driving
677 (0.8%)
    using mobile phone while driving
33 (<0.1%)
    using psychoactive substances
1 (<0.1%)
    vehicle electrical system failure
11 (<0.1%)
    vehicle equipment failure
2,747 (3.4%)
    worn-out/tire blowout
127 (0.2%)
    เส้นแบ่งทิศทางจราจรชำรุด
1 (<0.1%)
    ป้ายจราจรชำรุด
1 (<0.1%)
    มึนเมาจากแอลกอฮอล์
1 (<0.1%)
accident_type 81,735
    collision at intersection corner
1,067 (1.3%)
    collision during overtaking
83 (0.1%)
    collision with obstruction (on road surface)
2,742 (3.4%)
    head-on collision (not overtaking)
3,944 (4.8%)
    other
4,188 (5.1%)
    pedestrian collision
945 (1.2%)
    rear-end collision
24,901 (30%)
    rollover/fallen on curved road
10,443 (13%)
    rollover/fallen on straight road
33,046 (40%)
    side collision
336 (0.4%)
    turning/retreating collision
40 (<0.1%)
number_of_vehicles_involved 81,735 1.00 (1.00, 2.00)
number_of_fatalities 81,735 0.00 (0.00, 0.00)
number_of_injuries 81,735 0.00 (0.00, 1.00)
weather_condition 81,735
    clear
69,943 (86%)
    dark
368 (0.5%)
    foggy
544 (0.7%)
    land slide
1 (<0.1%)
    natural disaster
47 (<0.1%)
    other
162 (0.2%)
    rainy
10,670 (13%)
latitude 81,376 14.5 (13.5, 16.6)
    NA
359
longitude 81,376 100.56 (99.89, 101.29)
    NA
359
road_description 81,735
    bridge (across river/canal)
5 (<0.1%)
    connecting to private area
463 (0.6%)
    connecting to public/commercial area
1,001 (1.2%)
    connecting to school area
121 (0.1%)
    four-way intersection
348 (0.4%)
    grade-separated intersection/ramps
184 (0.2%)
    lane-changing area
7 (<0.1%)
    merge lane
19 (<0.1%)
    motorcycle lane
1 (<0.1%)
    other
7,614 (9.3%)
    pedestrian path
1 (<0.1%)
    roundabout
11 (<0.1%)
    sharp curve
616 (0.8%)
    straight road
58,183 (71%)
    t-intersection
414 (0.5%)
    u-turn area
10 (<0.1%)
    wide curve
12,552 (15%)
    y-intersection
184 (0.2%)
    zebra crossing/pedestrian crossing
1 (<0.1%)
slope_description 81,735
    no slope
65,797 (81%)
    other
11,575 (14%)
    slope area
4,363 (5.3%)
1 Median (IQR); Range; n (%)

The code chunk performs several functions for preparing and transforming the dataset:

  1. The filter() function from the dplyr package is used to remove rows with missing or empty longitude and latitude values, ensuring that the dataset contains only valid spatial data.
  2. The sf package’s st_as_sf() function converts the data frame into a spatial object (simple feature) using the longitude and latitude columns, setting the coordinate reference system (CRS) to EPSG 4326.
  3. The spatial data is reprojected to EPSG 32647 (a local UTM projection used in Thailand) using st_transform() from the sf package.
  4. The filter() function is also applied to retain data exclusively from the Bangkok Metropolitan Region (BMR) by filtering for specific provinces.
accidents_bmr <- accidents %>% 
  
  # Removes rows with missing longitude and latitude values
  filter(!is.na(longitude) & longitude != "",
         !is.na(latitude) & latitude !="") %>% 
  
  # Converts data to an sf object using longitude and latitude 
  st_as_sf(coords = c("longitude", "latitude"),
           crs = 4326) %>%
  
  # Transforms to the projection used in Thailand
  st_transform(crs = 32647) %>% 
  
  # Filter for BMR data
  filter(province_en %in% c("Bangkok", "Nonthaburi", "Nakhon Pathom", "Pathum Thani", "Samut Prakan", "Samut Sakhon")) 

Let’s perform a check for duplicated records before we move on. The code chunk below identifies all rows in the accidents_bmr dataframe that have an exact duplicate (i.e., another row with the same values in all columns) using group_by_all().

duplicate <- accidents_bmr %>% 
  group_by_all() %>% 
  filter(n()>1) %>% 
  ungroup()
  
duplicate
Simple feature collection with 0 features and 13 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: WGS 84 / UTM zone 47N
# A tibble: 0 × 14
# ℹ 14 variables: acc_code <dbl>, incident_datetime <dttm>, province_en <chr>,
#   agency <chr>, vehicle_type <chr>, presumed_cause <chr>,
#   accident_type <chr>, number_of_vehicles_involved <dbl>,
#   number_of_fatalities <dbl>, number_of_injuries <dbl>,
#   weather_condition <chr>, road_description <chr>, slope_description <chr>,
#   geometry <GEOMETRY [m]>

Results confirm that there are no duplicated records found.

Let’s take a quick glance at the points:

tmap_mode('plot')

tm_shape(accidents_bmr) +
  tm_dots(col = "#800200",
          alpha=0.4, 
          size=0.05) +
  tm_layout(main.title = "Accidents",
            main.title.position = "center",
            main.title.size = 1,
            bg.color = "#E4D5C9",
            frame = F)

2.2.2 Administrative Boundary

The adminboundary dataset, which we downloaded from HDX, is in ESRI shapefile format. To use this data in an R-environment, we need to import it as an sf object. We can do this using the st_read() function of the sf package. This function reads the shapefile data and returns an sf object that can be used for further analysis.

In the code chunk below, we use %>% operator is used to pipe the output of st_read() to the st_transform() function. Since the dataset we are using is the Thailand boundary, we need to assign the standard coordinate reference system for Thailand, which is EPSG 32647. st_transform() function transforms the coordinate reference system of the sf object to 32647.

adminboundary <- st_read(dsn = "data/geospatial", 
                layer = "tha_admbnda_adm1_rtsd_20220121") %>% 
  st_transform(crs = 32647)
Reading layer `tha_admbnda_adm1_rtsd_20220121' from data source 
  `C:\kytjy\ISSS626-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 77 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS:  WGS 84
st_crs(adminboundary)
Coordinate Reference System:
  User input: EPSG:32647 
  wkt:
PROJCRS["WGS 84 / UTM zone 47N",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["UTM zone 47N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",99,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Navigation and medium accuracy spatial referencing."],
        AREA["Between 96°E and 102°E, northern hemisphere between equator and 84°N, onshore and offshore. China. Indonesia. Laos. Malaysia - West Malaysia. Mongolia. Myanmar (Burma). Russian Federation. Thailand."],
        BBOX[0,96,84,102]],
    ID["EPSG",32647]]

Preliminary look into the adminboundary data shows that ADM1_EN can be used to filter for the 6 regions in BMR.

glimpse(adminboundary)
Rows: 77
Columns: 17
$ Shape_Leng <dbl> 2.417227, 1.695100, 1.251111, 1.884945, 3.041716, 1.739908,…
$ Shape_Area <dbl> 0.13133873, 0.07926199, 0.05323766, 0.12698345, 0.21393797,…
$ ADM1_EN    <chr> "Bangkok", "Samut Prakan", "Nonthaburi", "Pathum Thani", "P…
$ ADM1_TH    <chr> "กรุงเทพมหานคร", "สมุทรปราการ", "นนทบุรี", "ปทุมธานี", "พระนครศรีอ…
$ ADM1_PCODE <chr> "TH10", "TH11", "TH12", "TH13", "TH14", "TH15", "TH16", "TH…
$ ADM1_REF   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1ALT1EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1ALT2EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1ALT1TH <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1ALT2TH <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM0_EN    <chr> "Thailand", "Thailand", "Thailand", "Thailand", "Thailand",…
$ ADM0_TH    <chr> "ประเทศไทย", "ประเทศไทย", "ประเทศไทย", "ประเทศไทย", "ประเทศ…
$ ADM0_PCODE <chr> "TH", "TH", "TH", "TH", "TH", "TH", "TH", "TH", "TH", "TH",…
$ date       <date> 2019-02-18, 2019-02-18, 2019-02-18, 2019-02-18, 2019-02-18…
$ validOn    <date> 2022-01-22, 2022-01-22, 2022-01-22, 2022-01-22, 2022-01-22…
$ validTo    <date> -001-11-30, -001-11-30, -001-11-30, -001-11-30, -001-11-30…
$ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((674339.8 15..., MULTIPOLYGON (…
adminboundary_bmr <- adminboundary %>% 
  select("ADM1_EN") %>% 
  filter(ADM1_EN %in% c("Bangkok", "Nonthaburi", "Nakhon Pathom", "Pathum Thani", "Samut Prakan", "Samut Sakhon"))

After importing the dataset, we will plot it to see how it looks using tmap.

tmap_mode('plot')

tm_shape(adminboundary_bmr)+
  tm_fill(col ="#f4e9e8", 
          alpha = 0.6) +
  tm_borders(col = "#ddafa1",
             lwd = 0.1,  
             alpha = 1) +
  tm_layout(main.title = "BMR Administrative Boundary",
            main.title.position = "center",
            main.title.size = 1,
            bg.color = "#E4D5C9",
            frame = F)

Let’s plot out the administrative boundaries together with the points of accidents:

tmap_mode('plot')

tm_shape(adminboundary_bmr) +
  tm_fill(col ="#f4e9e8", 
          alpha = 0.6) +
  tm_borders(col = "#ddafa1",
             lwd = 0.1,  
             alpha = 1) +
tm_shape(accidents_bmr) +
  tm_dots(col = "#800200",
          alpha=0.4, 
          size=0.05) +
  tm_layout(main.title = "BMR Administrative Boundary and Accidents",
            main.title.position = "center",
            main.title.size = 1,
            bg.color = "#E4D5C9",
            frame = F)

2.2.3 Road Lines

The code chunk below imports MP_SUBZONE_WEB_PL shapefile by using st_read() of sf packages.

roads <- st_read(dsn = "data/geospatial", 
                layer = "hotosm_tha_roads_lines_shp")
roads_sf <- st_set_crs(roads, 4326) %>% 
  st_transform(crs = 32647) %>% 
  st_cast("LINESTRING")
st_crs(roads_sf)

Results of the code above show that the EPSG is indicated as 32647 now.

Upon reviewing the road classifications against the highway descriptions by OpenStreetMap, we observe that not all categories are relevant to our analysis, which focuses primarily on driving networks. We will keep the 13 types of road segments below for the scope of our study.

Name Description
Primary, Primary_Link Major highway linking large towns
Secondary, Secondary_Link Highways which are not part of major routes, but nevertheless form a link in the national route network
Tertiary, Tertiary_Link Roads connecting smaller settlements, and within large settlements for roads connecting local centres
Trunk, Trunk_Link Major highway that don’t meet the requirements for motorways, and their link roads (sliproads and ramps).
Motorway, Motorway_Link Highest-performance roads within a territory, generally referred to as motorways, freeways, or expressways.
Residential Road generally used for local traffic within the settlement. Primarily used for access to residential properties but may include access to some non-residential properties (e.g. a corner shop or convenience store).
Unclassified Roads with the lowest priority in the interconnecting road network
Service Provides access to a building service station, beach, campsite, industrial estate, business park, etc.

In the code chunk below, we will first specify the road classes that we want to retain. Next, we will filter roads_sf object to remove all the rows that does not have our desired highway attribute value.

highwaytypes <- c("primary",
                  "primary_link", 
                  "secondary", 
                  "secondary_link", 
                  "tertiary", 
                  "tertiary_link",
                  "trunk",
                  "trunk_link",
                  "motorway",
                  "motorway_link",
                  "residential",
                  "unclassified",
                  "service") 

roads_sf_filtered <- roads_sf %>%
  select("highway") %>% 
  filter(highway %in% highwaytypes)

Since the roads dataset includes areas outside BMR and our analysis is focused on the region within the BMR, we will need to remove unnecessary rows. To do so, we will use st_intersection().

roads_bmr <- st_intersection(roads_sf_filtered,
                             adminboundary_bmr)
roads_bmr
Simple feature collection with 555712 features and 2 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 588649.4 ymin: 1484439 xmax: 712417 ymax: 1579041
Projected CRS: WGS 84 / UTM zone 47N
First 10 features:
          highway ADM1_EN                       geometry
1       secondary Bangkok LINESTRING (693686.1 151979...
2     residential Bangkok LINESTRING (693358 1519300,...
3  secondary_link Bangkok LINESTRING (692949.1 151886...
4         service Bangkok LINESTRING (693256 1519184,...
5       secondary Bangkok LINESTRING (692810.8 151863...
6         service Bangkok LINESTRING (693877.2 152042...
27        service Bangkok LINESTRING (691782.9 152016...
49        service Bangkok LINESTRING (668739.2 152053...
50        service Bangkok LINESTRING (665017.7 151746...
54        service Bangkok LINESTRING (665068 1517492,...

Now that we have scoped out the dataset, we will now plot to see the BMR road network.

par(bg = '#E4D5C9', mar = c(0,0,1,0))

plot(st_geometry(adminboundary_bmr),
     col = "#f4e9e8",
     main = "Road Network in BMR")
plot(st_geometry(roads_bmr),
     add=T,
     col='#800200')

3 Data Wrangling

3.1 Deriving new variables from Accidents data

Using the incident_datetime column, we can also derive additional columns such as seasons, month, day of the week, time of the day (e.g. morning or evening peak periods). The morning rush hour is said to last from 6am to 9am and evening rush hour is reported to be from 4pm to 7pm (The Nation).

I also think it would be interesting to derive a column that indicates accidents that happened during the Songkran festival holiday. Notoriously dubbed as the Seven Deadly Days of Songkran, road accidents in Bangkok is said to surge due increased traffic from people traveling to celebrate with family.

accidents_bmr_extra <- accidents_bmr %>% 
  # Derive month, days, hour columns as well as a Songkran indicator
  mutate(
    month = month(incident_datetime,
                  label = FALSE),
    day = wday(incident_datetime,
                    label = FALSE),
    hour = hour(incident_datetime),
    songkran = ifelse(
      as_date(incident_datetime) >= as_date(paste0(year(incident_datetime), "-04-09")) &
        as_date(incident_datetime) <= as_date(paste0(year(incident_datetime), "-04-16")) &
        year(incident_datetime) %in% c(2019, 2020, 2021, 2022),
      1,
      0)) %>%

  # Derive season column and peak period indicator
  mutate(
    weektype = ifelse(
      day %in% c("6", "7"),
      "weekend",
      "weekday"
    ),
    season = ifelse(
      month %in% c("2", "3", "4", "5"),
      "Summer",
      ifelse(
        month %in% c("6", "7", "8", "9", "10"),
        "Rainy",
        "Winter"
        )
      ),
    peakperiod = ifelse(
      hour > 6 & hour < 9,
      "morningpeak",
      ifelse(
        hour > 16 & hour < 19,
        "eveningpeak",
        "non-peak"
      )
    )
    ) %>% 

  # Drop columns not required anymore
  select(-c("acc_code", 
            "incident_datetime", 
            "agency"
            ))

The code chunk above performs the following function:

  1. The lubridate package within tidyverse is utilised to extract temporal components (month, day of the week, and hour) from the incident_datetime column using the month(), wday(), and hour() functions, respectively.
  2. The songkran indicator is created using ifelse(), along with as_date() and year(), to generate a binary variable (1 for “Yes”, 0 for “No”) indicating whether the accident occurred during the Songkran festival (April 9 to April 16) in the years 2019, 2020, 2021, or 2022.
  3. Two additional indicators are derived using ifelse():
    • season variable: classifies accidents into seasons (“Summer”, “Rainy”, or “Winter”) based on the month.
    • peakperiod variable: identifies whether the accident occurred during morning peak hours (7-9 AM), evening peak hours (5-7 PM), or non-peak times.
  4. Finally, select() from dplyr is used to keep only the necessary variables for the study.

3.2 Converting sf format into spatstat’s ppp format

In order to use the capabilities of spatstat package, a spatial dataset should be converted into an object of class planar point pattern (ppp). A ppp object contains the spatial coordinates of the points, the marks attached to the points (if any), the window in which the points were observed, and the name of the unit of length for the spatial coordinates. Thus, a single object of class ppp contains all the information required to perform spatial point pattern analysis.

In previous section, we have created sf objects of accident points. Now, we will convert them into ppp objects using as.ppp() function from spatstat package.

The code chunk below converts the accidents_bmr_extra object to a point pattern object of class ppp. st_coordinates() function is used to extract the coordinates of the accidents_bmr_extra object and st_bbox() function is used to extract the bounding box of the accidents_bmr_extra object. The resulting object accidents_ppp is a point pattern object of class ppp.

accidents_ppp <- as.ppp(st_coordinates(accidents_bmr_extra),
                        st_bbox(accidents_bmr_extra))

par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

plot(accidents_ppp)

3.3 Duplicates check

We will use summary() function to get summary information of accidents_ppp object.

summary(accidents_ppp)
Planar point pattern:  12986 points
Average intensity 1.218049e-06 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 10 decimal places

Window: rectangle = [591277.5, 710166.1] x [1486845.7, 1576520.5] units
                    (118900 x 89670 units)
Window area = 10661300000 square units

3.4 Jittering

To resolve the issue of duplicated points, we apply jittering with the rjitter() function. This adds a small variation to the points, preventing them from occupying the exact same location.

set.seed(1234)
accidents_ppp_jit <- rjitter(accidents_ppp, 
                             retry=TRUE, 
                             nsim=99, 
                             drop=TRUE)

The code below checks the jittered points in the chosen simulation (i.e. Simulation 99) to ensure that no duplicates remain after applying jittering.

any(duplicated(accidents_ppp_jit[["Simulation 99"]]))
[1] FALSE
par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

accidents_ppp_jit <- accidents_ppp_jit[["Simulation 99"]]

plot(accidents_ppp_jit)

3.5 Creating Observation Windows

Many data types in spatstat require us to specify the region of space inside which the data were observed. This is the observation window and it is represented by an object of class owin. In this analysis, our study area is BMR, hence we will use BMR boundary as the observation window for spatial point pattern analysis.

To convert our adminboundary_bmr sf object to owin object, we will use as.owin() function from spatstat package.

adminboundary_bmr_owin <- as.owin(adminboundary_bmr)

par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

plot.owin(adminboundary_bmr_owin)

summary(adminboundary_bmr_owin)
Window: polygonal boundary
single connected closed polygon with 13779 vertices
enclosing rectangle: [587893.5, 712440.5] x [1484413.7, 1579076.3] units
                     (124500 x 94660 units)
Window area = 7668990000 square units
Fraction of frame area: 0.65

3.6 Combing ppp and owin objects

In section 3.4, we have created our ppp object (accidents_ppp_jit) which represents the spatial points of accident locations. In section 3.5, we have created a owin object called (adminboundary_bmr_owin), which represent the observation window of our analysis.

The observation window adminboundary_bmr_owin and the point pattern accidents_ppp_jit can be combined, so that the custom window replaces the default rectangular extent (as seen in section 3.2).

acc_bmr_ppp = accidents_ppp_jit[adminboundary_bmr_owin]

par(bg = '#E4D5C9',
    mar = c(0,0,1,0))
plot(acc_bmr_ppp)

4 Exploratory Spatial Data Analysis

Descriptive statistics are used in point pattern analysis to summarise a point pattern’s basic properties, such as its central tendency and dispersion. The mean center and the median centre are 2 frequently used metrics for central tendency.

4.1 Measuring Central Tendency

4.1.1 Mean Center

Mean center is the arithmetic average of the (x, y) coordinates of all point in the study area. Similar to mean in statistical analysis, mean center is influenced to a greater degree by the outliers.

accidents_xy <- st_coordinates(accidents_bmr_extra)
accidents_mc <- apply(accidents_xy, 2, mean)

accidents_mc
        X         Y 
 668399.5 1523495.8 

The results show that the mean centre is at (668399.5, 1523495.9).

4.1.2 Median Center

Median center is the location that minimises the sum of distances required to travel to all points within an observation window. The procedure begins at a predetermined point, such as the median center, as the initial point. Then, the algorithm updates the median center’s new coordinates (x’, y’) continually until the optimal value is reached. The median center, as opposed to the mean center, offers a more reliable indicator of central tendency as it is unaffected by outliers.

accidents_medc <- apply(accidents_xy, 2, median)
accidents_medc
        X         Y 
 673446.1 1520755.0 

Based on the results, the median centre of accidents is (673446.1, 1520755.0).

The mean centers and median centers are similar. This may imply that the distribution of the data is relatively balanced and there is not a significant difference in the spatial patterns between the accident points. Additionally, this indicates that both the mean center and median center are effective measures for analysing the central tendency of the data in this context.

par(bg = '#E4D5C9', mar = c(0,0,1,0))

plot(st_geometry(adminboundary_bmr), 
     col='#f4e9e8')

plot(accidents_xy, 
     add = T, cex=0.7, pch = 21,
     main="Mean and Median Centers of Accidents in BMR")
points(cbind(accidents_mc[1], accidents_mc[2]), pch='*', col='#f5347f', cex=3)
points(cbind(accidents_medc[1], accidents_medc[2]), pch='*', col='#bb8bdc', cex=3)

4.2 Measuring Dispersion

4.2.1 Standard Distance

Standard distances are defined similarly to standard deviations. This indicator measures how dispersed a group of points is around its mean center.

accidents_sd <- sqrt(sum((accidents_xy[,1] - accidents_mc[1])^2 +
                           (accidents_xy[,2] - accidents_mc[2])^2) 
                     / nrow(accidents_xy))

accidents_sd
[1] 27235.14

4.2.2 Plotting Standard Distance

In this section, we will create bearing circle of accident points using the standard distance value we have calculated earlier. This can provide visual representation of the dispersion.

par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

plot(st_geometry(adminboundary_bmr), col='#f4e9e8', main="Standard Distance of Accidents in BMR")

plot(accidents_xy, cex=.5)
points(cbind(accidents_mc[1], accidents_mc[2]), pch='*', col='#f5347f', cex=3)

bearing <- 1:360 * pi/180
cx <- accidents_mc[1] + accidents_sd * cos(bearing)
cy <- accidents_mc[2] + accidents_sd * sin(bearing)
circle <- cbind(cx, cy)
lines(circle, col='#f5347f', lwd=2)

4.3 Spatial Randomness Test

Clark and Evans (1954) give a very simple test of spatial randomness called Clark and Evans aggregation index (R). It is the ratio of the observed mean nearest neighbour distance in the pattern to that expected for a Poisson point process of the same intensity. R-value >1 suggests ordering, while R-value <1 suggests clustering.

We will perform the Clark-Evans test of aggregation for a spatial point pattern by using clarkevans.test() of statspat.

The test hypotheses are:

\(H_0\) = The distribution of accident points are randomly distributed.

\(H_1\) = The distribution of accidents points are not randomly distributed.

The 95% confidence interval will be used.

set.seed(1234)
clarkevans.test(acc_bmr_ppp,
                correction="none",
                clipregion="adminboundary_bmr_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  acc_bmr_ppp
R = 0.22579, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)

The Clark-Evans test for the accident points shows an R-value of 0.22579, which is less than 1. This indicates a clustered distribution. The p-value is less than 2.2e-16, which is extremely small and less than the significance level of 0.05. This means that we will reject the null hypothesis (\(H_0\)) and accept the alternative hypothesis (\(H_1\)).

Therefore, the statistical inference from this test is that the accident points are not randomly distributed but are clustered. This suggests that there may be underlying factors influencing the spatial distribution of these points.

5 First-Order Spatial Point Pattern Analysis

We can now start to perform first-order spatial point pattern analysis using functions from spatstat package.

First-order properties concern the characteristics of individual point locations and their variations of their density across space and are mostly addressed by density-based techniques, such as quadrant analysis and kernel density estimation.

Investigation of the intensity of a point pattern is one of the first and most important steps in point pattern analysis. If the point process has an intensity function λ(u), this function can be estimated non-parametrically by kernel estimation. Kernel estimation allows for smoothing of the probability density estimation of a random variable (in this analysis a point event) based on kernels as weights.

5.1 Rescaling acc_bmr_ppp

The EPSG: 32647 Coordinate References System uses meters as the standard unit. Thus, acc_bmr_ppp prepared earlier is also in metres. However, we will need to convert the measuring unit from metre to kilometeres when calculating the kernel density estimators for entirety of BMR because kilometers provide a more appropriate scale for analysing large areas.

acc_bmr_ppp.km <- rescale(acc_bmr_ppp, 1000, "km")

5.2 Computing Default Kernel Density Estimation

KDE is particularly effective in identifying areas with high occurrences of traffic accidents. It divides the study area into cells, overlays a symmetrical, curved surface on each accident location, and aggregates these values within a given radius to estimate density. The density is highest at the accident location and decreases with distance, reflecting a distance decay effect.

To do that, we use density.ppp() from the spatstat package.

Show the code
par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

kde_default <- density(acc_bmr_ppp.km)
plot(kde_default, main = "Default Density KDE")
contour(kde_default, add=TRUE)

The key argument to pass to the density() function for point pattern objects is sigma, which determines the smoothing bandwidth of the kernel. A narrow bandwidth can reveal local fluctuations, useful for detailed analyses, while a larger bandwidth smooths out variations, providing a general overview.

By default, when the sigma value isn’t provided, a bandwidth is determined by a simple rule of thumb that depends only on the size of the window. This default setting might not always give the desired result. In the KDE plot we generated, there’s evidence of over-smoothing, where only one large spatial cluster is visible, potentially hiding smaller clusters or important details.

To address this, we can manually set the bandwidth using the sigma argument or choose a different kernel function through the kernel argument. This will help create more intuitive and detailed KDE maps that better capture the structure of the data.

5.3 KDE Layers with Fixed Bandwidth

5.3.1 Computing Fixed Bandwidths Using Different Bandwidth Selection Methods

4 automatic bandwidth calculation methods are available:

  • bw.diggle(): In the Cross Validated Bandwidth Selection, the bandwidth is chosen to minimise the mean-square error criterion. The mean-square error is a measure of the average of the squares of the errors - that is, the average squared difference between the estimated values and the actual value.

  • bw.CvL(): In the Cronie and van Lieshout’s Criterion for Bandwidth Selection, the bandwidth is chosen to minimise the discrepancy between the area of the observation window and the sum of reciprocal estimated intensity values at the points of the point process. This method aims to choose a bandwidth that best represents the underlying point process, taking into account both the observed points and the area they occupy.

  • bw.scott(): In the Scott’s Rule for Bandwidth Selection, the bandwidth is computed by the rule of thumb where the bandwidth is proportional to \(n^{-1/(d+4)}\), where n is the number of points and d is the number of spatial dimensions. This method is useful for estimating gradual trend.

  • bw.ppl(): In the Likelihood Cross Validation Bandwidth Selection, the bandwidth is chosen to maximise the point process likelihood.

bw_diggle <- bw.diggle(acc_bmr_ppp.km)
bw_diggle
     sigma 
0.04025879 
bw_CvL <- bw.CvL(acc_bmr_ppp.km)
bw_CvL
   sigma 
11.55349 
bw_scott <- bw.scott(acc_bmr_ppp.km)
bw_scott
 sigma.x  sigma.y 
4.527996 3.318169 
bw_ppl <- bw.ppl(acc_bmr_ppp.km)
bw_ppl
    sigma 
0.3493275 
bw_scott_iso <- bw.scott.iso(acc_bmr_ppp.km)
bw_scott_iso 
   sigma 
3.876165 
par(bg = '#E4D5C9',
    #mar = c(0,0,1,0),
    mfrow = c(1,2))

plot(bw_diggle, xlim=c(0.0,0.06), ylim=c(-160,100))
plot(bw_CvL)

par(bg = '#E4D5C9',
    #mar = c(0,0,1,0),
    mfrow = c(1,2))

plot(bw_scott, main="bw_scott")

plot(bw_ppl,
     xlim=c(-1,5), 
     ylim=c(00,30000))

5.3.2 Choosing Fixed-Bandwidth KDE

kde_diggle <- density(acc_bmr_ppp.km, bw_diggle)
kde_CvL <- density(acc_bmr_ppp.km, bw_CvL)
kde_scott <- density(acc_bmr_ppp.km, bw_scott)
kde_ppl <- density(acc_bmr_ppp.km, bw_ppl)

par(bg = '#E4D5C9',
    mar = c(1,1,1,1.5),
    mfrow = c(2,2))

plot(kde_diggle,main = "kde_diggle")
plot(kde_CvL,main = "kde_CvL")
plot(kde_scott,main = "kde_scott")
plot(kde_ppl,main = "kde_ppl")

Next, we will try to plot histograms to compare the distribution of KDE values obtained from density() function using different bandwidth selection methods.

par(bg = '#E4D5C9',
    mar = c(2,2,2,2),
    mfrow = c(2,2))

hist(kde_diggle,main = "kde_diggle")
hist(kde_CvL,main = "kde_CvL")
hist(kde_scott,main = "kde_scott")
hist(kde_ppl,main = "kde_ppl")

par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

kde_fixed_scott <- density(acc_bmr_ppp.km, bw_scott)
plot(kde_fixed_scott,
     main = "Fixed-Bandwidth KDE for Accident Points (Using bw_scott)")
contour(kde_fixed_scott, add=TRUE)

Visual inspection reveals that the bandwidth suggested by the bw_scott method causes noticeable over-smoothing. Although automatic bandwidth selection offers a solid initial estimate, fine-tuning is often required to achieve more accurate results.

To counteract the over-smoothing, we will apply a simple adjustment by reducing the bandwidth by half. This reduction should help capture more detailed patterns in the data and avoid the loss of important spatial information.

par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

kde_fixed_scott <- density(acc_bmr_ppp.km, bw_scott/2)
plot(kde_fixed_scott,
     main = "Fixed-Bandwidth KDE for Accident Points (Using bw_scott)")
contour(kde_fixed_scott, add=TRUE)

From the plot above, it seems that reducing the bandwidth (which shrinks the point cluster buffers) has lessened the over-smoothing effect while still clearly highlighting the accident hotspot areas.

5.3.3 Kernel Methods

There are 4 types of kernels in density.ppp(), namely: Gaussian, Epanechnikov, Quartic and Dics. Gaussian kernel is the default.

The code chunk below will be used to compute 3 more kernel density estimations by using these three kernel function.

kde_fixed_scott.gaussian <- density(acc_bmr_ppp.km, 
                          sigma=bw_scott, 
                          edge=TRUE, 
                          kernel="gaussian")


kde_fixed_scott.epanechnikov <- density(acc_bmr_ppp.km, 
                          sigma=bw_scott, 
                          edge=TRUE, 
                          kernel="epanechnikov")
   
kde_fixed_scott.quartic <- density(acc_bmr_ppp.km, 
                          sigma=bw_scott, 
                          edge=TRUE, 
                          kernel="quartic")
       
   
kde_fixed_scott.disc <- density(acc_bmr_ppp.km, 
                          sigma=bw_scott, 
                          edge=TRUE, 
                          kernel="disc")
         
par(bg = '#E4D5C9',
    mar = c(0,0,1,0),
    mfrow = c(2,2))

plot(kde_fixed_scott.gaussian, main="Gaussian")
plot(kde_fixed_scott.epanechnikov, main="Epanechnikov")
plot(kde_fixed_scott.quartic, main="Quartic")
plot(kde_fixed_scott.disc, main="Disc")

5.4 KDE Layers with Spatially Adaptive Bandwidth

The bandwidth of a kernel estimator can be either fixed across the entire mapping area or adaptive to suit in local situations. The fixed bandwidth method explored in our earlier analysis is said to be sensitive to highly skewed distribution of spatial point patterns over geographical units for example urban versus rural. One way to overcome this problem is by using adaptive bandwidth instead.

In the section below, we compare the fixed with adaptive bandwidth-based KDE, and how they were able to detect accident hot spots.

adaptive.density() of Spatstat offers 3 estimation methods:

  1. method = “voronoi”: which estimates the intensity using the Voronoi-Dirichlet tessellation
  2. method = “kernel”: which estimates the intensity using a variable-bandwidth kernel estimator
  3. `method = “nearest”: which computes an estimate of the intensity function of a point pattern dataset using the distance from each spatial location to the kth nearest points
kde_adaptive_vd <- adaptive.density(acc_bmr_ppp.km, 
                                    method = "voronoi")
Show the code
par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

plot(kde_adaptive_vd,
     main = "Voronoi-Dirichlet Adaptive Density Estimate")

kde_adaptive_kernel <- adaptive.density(acc_bmr_ppp.km, 
                                        method = "kernel")
Show the code
par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

plot(kde_adaptive_kernel,
     main = "Adaptive Kernel Density Estimate")

kde_adaptive_knn <- adaptive.density(acc_bmr_ppp.km,
                                     method = "nearest",
                                     k = 100)
Show the code
par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

plot(kde_adaptive_knn,
     main = "Nearest-Neighbour Adaptive Density Estimate")

5.4.1 Choosing Adaptive KDE Method

Just as we did with the fixed bandwidth, we can create histograms to compare the distribution of KDE values obtained from the density() function using different adaptive bandwidth selection methods.

par(bg = '#E4D5C9', 
    mar = c(2,2,2,2),
    mfrow = c(2,2))

hist(kde_adaptive_vd, main = "Voronoi-Dirichlet Adaptive")
hist(kde_adaptive_kernel, main = "Adaptive Kernel")
hist(kde_adaptive_knn, main = "Nearest-Neighbour Adaptive")

5.5 Plotting Interactive KDE Maps

raster_kde_fixed_scott <- raster(kde_fixed_scott)
raster_kde_adaptive_nn <- raster(kde_adaptive_knn)
raster_kde_adaptive_kernel <- raster(kde_adaptive_kernel)

projection(raster_kde_fixed_scott) <- CRS("+init=EPSG:32647 +units=km")
projection(raster_kde_adaptive_nn) <- CRS("+init=EPSG:32647 +units=km")
projection(raster_kde_adaptive_kernel) <- CRS("+init=EPSG:32647 +units=km")
Show the code
tmap_mode('view')
kde_fixed_scott <- tm_basemap(server = "OpenStreetMap") +
  tm_shape(raster_kde_fixed_scott) +
  tm_raster("layer",
            n = 10,
            title = "KDE_Fixed_scott",
            alpha = 0.6,
            palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e")) +
  tm_shape(adminboundary_bmr)+
  tm_polygons(alpha=0.1,id="ADM1_EN")+
  tmap_options(check.and.fix = TRUE)

kde_adaptive_nn <- tm_basemap(server = "OpenStreetMap") +
  tm_shape(raster_kde_adaptive_nn) +
  tm_raster("layer",
            n = 7,
            title = "KDE_Adaptive_nn",
            style = "pretty",
            alpha = 0.6,
            palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e")) +
  tm_shape(adminboundary_bmr)+
  tm_polygons(alpha=0.1,id="ADM1_EN")+
  tmap_options(check.and.fix = TRUE)

kde_adaptive_kernel <- tm_basemap(server = "OpenStreetMap") +
  tm_shape(raster_kde_adaptive_kernel) +
  tm_raster("layer",
            n = 7,
            title = "KDE_Adaptive_Kernel",
            style = "pretty",
            alpha = 0.6,
            palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e")) +
  tm_shape(adminboundary_bmr)+
  tm_polygons(alpha=0.1,id="ADM1_EN")+
  tmap_options(check.and.fix = TRUE)

tmap_arrange(kde_fixed_scott, 
             kde_adaptive_nn, 
             kde_adaptive_kernel,
             ncol=1,
             nrow=3,
             sync = TRUE)
Show the code
tmap_mode('plot')

5.6 Province-Level KDE

For a more detailed look, we will create province-area level KDE maps for 2 provinces identified. In order to create such maps, we will carry out additional data wrangling as required.

Firstly, we will filter out different planning areas as separate sf objects from adminboundary_bmr.

bkk = adminboundary_bmr %>% filter(ADM1_EN == "Bangkok")
spk = adminboundary_bmr %>% filter(ADM1_EN == "Samut Prakan")

par(bg = '#E4D5C9',
    mar = c(1,1,1,0),
    mfrow=c(1,2))
plot(st_geometry(bkk), main = "Bangkok")
plot(st_geometry(spk), main = "Samut Prakan")

Next, we will create owin objects to represent the observation windows for respective planning area. Once owin objects are created, we will also filter accident locations in each observation window from the original acc_bmr_ppp ppp object.

bkk_owin = as.owin(bkk)
spk_owin = as.owin(spk)

acc_bkk_ppp = acc_bmr_ppp[bkk_owin]
acc_spk_ppp = acc_bmr_ppp[spk_owin]

Now that we have prepared both owin and ppp objects for each planning area, we are ready to plot KDE maps. Similar to what we have done in previous section, we will try both fixed-bandwidth and adaptive bandwidth KDE maps.

5.6.1 Province-level Fixed Bandwidth KDE Maps

bkk_kde_scott <- density(acc_bkk_ppp, sigma=bw.scott, main="Bangkok")
spk_kde_scott <- density(acc_spk_ppp, sigma=bw.scott, main="Samut Prakan")

par(bg = '#E4D5C9', 
    mar = c(1,1,1,1.5),
    mfrow = c(1,2))

plot(bkk_kde_scott,
     main = "Fixed KDE - Bangkok")
contour(bkk_kde_scott, 
        add=TRUE)

plot(spk_kde_scott,
     main = "Fixed KDE - Samut Prakan")
contour(spk_kde_scott, 
        add=TRUE)

5.6.2 Province-level Adaptive-Bandwidth KDE Maps

bkk_kde_adaptive_kernel <- adaptive.density(acc_bkk_ppp, method = "kernel")
spk_kde_adaptive_kernel <- adaptive.density(acc_spk_ppp, method = "kernel")


par(bg = '#E4D5C9', mar = c(1,1,1,1.5),mfrow = c(1,2))

plot(bkk_kde_adaptive_kernel,
     main = "Adaptive KDE - Bangkok")
contour(bkk_kde_adaptive_kernel, 
        add=TRUE)

plot(spk_kde_adaptive_kernel,
     main = "Adaptive KDE - Samut Prakan")
contour(spk_kde_adaptive_kernel, 
        add=TRUE)

6 Network Constrained Kernel Density Estimation (NKDE)

In real-world scenarios, events like accidents tend to follow specific networks, such as road systems, rather than being randomly spread out. Traditional Kernel Density Estimation (KDE) assumes events occur across an open, two-dimensional space, which doesn’t accurately reflect network-based patterns like road traffic. Since movement on these networks is confined to one-dimensional paths, Network Constrained Kernel Density Estimation (NKDE) provides an extension of spatial KDE.

NKDE estimates event density strictly along the network, dividing roads into small units called “lixels” (like one-dimensional pixels) and calculating event density at the center of these segments. Instead of using straight-line (Euclidean) distances between events, NKDE measures the shortest path along the network. This method adjusts the intensity estimate to reflect density along a road rather than across an area, providing a clearer interpretation.

In this section, we’ll focus on generating NKDE maps for the two provinces identified earlier, using the spNetwork package to better understand these road-based event patterns.

6.1 Extracting Road Networks for Bangkok and Samut Prakan

First, we will have to extract the road network details and accidents pertaining to our 2 provinces.

The code chunk below performs the following functions:

  • st_intersection() is used to find the geometries that overlap between roads_bmr (the road network data for the entire Bangkok Metropolitan Region) and the boundary of Bangkok or Samut Prakan. st_union() combines all individual geometries within the boundary of Bangkok or Samut Prakan into a single geometry to perform the intersection. This ensures that the entire boundary of Bangkok is treated as a single spatial unit during the intersection.
  • After the intersection, filter() ensures that only LINESTRING geometries (i.e., line segments representing roads) are retained. This step removes any points or other geometries that may have resulted from the intersection.
  • Similar to the road network extraction, the st_intersection() function is used to find the accidents (from accidents_bmr_extra, which contains accident data for the BMR) that fall within the Bangkok and Samut Prakan boundaries.
# Road Network
bkk_network <- st_intersection(roads_bmr, st_union(bkk)) %>% 
  filter(st_geometry_type(.) == "LINESTRING")

spk_network <- st_intersection(roads_bmr, st_union(spk)) %>% 
  filter(st_geometry_type(.) == "LINESTRING")

write_rds(bkk_network, "data/rds/bkk_network.rds")
write_rds(spk_network, "data/rds/spk_network.rds")
# Accidents
bkk_acc <- st_intersection(accidents_bmr_extra, bkk)
spk_acc <- st_intersection(accidents_bmr_extra, spk)

write_rds(bkk_acc, "data/rds/bkk_acc.rds")
write_rds(spk_acc, "data/rds/spk_acc.rds")

6.2 Preparing lixel objects

Before calculating the NKDE, the road network must be segmented into smaller units, called lixels, using the lixelize_lines() function from the spNetwork package. A lixel represents a linear pixel along the road network, which is used as a basis for estimating spatial density.

In the code chunk below, each lixel is set to a length of 500 meters, while the minimum length (mindist) is set to 250 meters. This means that if the final lixel in a segment is shorter than 250 meters, it will be merged with the preceding lixel. Lixels that are already shorter than the minimum distance remain unchanged.

Since we have narrowed down to the 2 provinces where road traffic accidents can be highly clustered around certain hotspots, a smaller lixel size (i.e., 500 meters) might offer more detailed insights into local variations in accident density. Larger lixels, like 1000 meters, can potentially smooth out these variations, missing critical hotspots, especially in densely populated urban areas.

bkk_lixels <- lixelize_lines(bkk_network, 
                         500, 
                         mindist = 250)

spk_lixels <- lixelize_lines(spk_network, 
                         500, 
                         mindist = 250)

write_rds(bkk_lixels, "data/rds/bkk_lixels.rds")
write_rds(spk_lixels, "data/rds/spk_lixels.rds")

6.3 Generating line centre points

Next, lines_center() of spNetwork will be used to generate a SpatialPointsDataFrame with line centre points. The points are located at center of the line based on the length of the line.

bkk_samples <- lines_center(bkk_lixels)
spk_samples <- lines_center(spk_lixels)

6.4 NKDE Calculation

spNetwork package supports 3 NKDE methods: simple, discontinuous and continuous. Gelb (2021) summarises the 3 methods as such:

  • Simple: This method replaces Euclidean distances with network distances to estimate density along the network. It adapts the kernel formula to calculate density over a linear unit rather than an areal unit, making it intuitive but potentially biased.

  • Discontinuous: This method addresses the bias in the simple method by ensuring the kernel function divides the density mass equally at intersections. It conserves mass, making it a more accurate estimator, but the density becomes discontinuous.

  • Continuous: To overcome the discontinuity, this method adjusts the density before intersections to make the function continuous while still conserving mass, offering a more intuitive and smooth result.

In the following code chunks, we will compare NKDE values across three different methods. After several attempts, we optimised the computation time by adjusting certain parameters: reducing the bandwidth (bw) to 500, setting the maximum depth (max_depth) to 8, aggregating 5 events within a certain distance (agg), and specifying a grid_shape of c(10, 10). With the gridded application of the NKDE, the calculation is then performed in each cell of the grid. These adjustments were necessary due to the large size of our dataset.

bkk_nkde_simple <- nkde(bkk_network,
                        events = bkk_acc,
                        w = rep(1, nrow(bkk_acc)),
                        samples = bkk_samples,
                        kernel_name = "quartic",
                        bw = 500, #<<
                        div= "bw",
                        method = "simple",
                        digits = 1,
                        tol = 1,
                        grid_shape = c(10,10),
                        max_depth = 8, #<<
                        agg = 5, #<<
                        sparse = TRUE,
                        verbose = FALSE,
                        study_area = bkk)
spk_nkde_simple <- nkde(spk_network,
                        events = spk_acc,
                        w = rep(1, nrow(spk_acc)),
                        samples = spk_samples,
                        kernel_name = "quartic",
                        bw = 500, 
                        div= "bw",
                        method = "simple",
                        digits = 1,
                        tol = 1,
                        grid_shape = c(10,10),
                        max_depth = 8,
                        agg = 5,
                        sparse = TRUE,
                        verbose = FALSE,
                        study_area = spk)
bkk_nkde_discon <- nkde(bkk_network,
                        events = bkk_acc,
                        w = rep(1, nrow(bkk_acc)),
                        samples = bkk_samples,
                        kernel_name = "quartic",
                        bw = 500, 
                        div= "bw",
                        method = "discontinuous",
                        digits = 1,
                        tol = 1,
                        grid_shape = c(10,10),
                        max_depth = 8,
                        agg = 5,
                        sparse = TRUE,
                        verbose = FALSE,
                        study_area = bkk)
spk_nkde_discon <- nkde(spk_network,
                        events = spk_acc,
                        w = rep(1, nrow(spk_acc)),
                        samples = spk_samples,
                        kernel_name = "quartic",
                        bw = 500, 
                        div= "bw",
                        method = "discontinuous",
                        digits = 1,
                        tol = 1,
                        grid_shape = c(10,10),
                        max_depth = 8,
                        agg = 5,
                        sparse = TRUE,
                        verbose = FALSE,
                        study_area = spk)
bkk_nkde_cont <- nkde(bkk_network,
                      events = bkk_acc,
                      w = rep(1, nrow(bkk_acc)),
                      samples = bkk_samples,
                      kernel_name = "quartic",
                      bw = 500, 
                      div= "bw",
                      method = "continuous",
                      digits = 1,
                      tol = 1,
                      grid_shape = c(10,10),
                      max_depth = 8,
                      agg = 5,
                      sparse = TRUE,
                      verbose = FALSE,
                      study_area = bkk)
spk_nkde_cont <- nkde(spk_network,
                      events = spk_acc,
                      w = rep(1, nrow(spk_acc)),
                      samples = spk_samples,
                      kernel_name = "quartic",
                      bw = 500, 
                      div= "bw",
                      method = "continuous",
                      digits = 1,
                      tol = 1,
                      grid_shape = c(10,10),
                      max_depth = 8,
                      agg = 5,
                      sparse = TRUE,
                      verbose = FALSE,
                      study_area = spk)

6.5 Visualising NKDE

Before we can visualise the NKDE values, we have to patch in the computed density values into samples and lixels.

To enhance the readability of the results, we first multiply the obtained densities by the total number of accidents. This adjustment ensures that the spatial integral equals the number of events. Subsequently, we multiply this value by 1000 to derive the estimated number of accidents per kilometer. This process will provide a more intuitive understanding of the density distribution along the network.

# Bangkok
bkk_samples$nkde_simple <- bkk_nkde_simple*nrow(bkk_acc)*1000
bkk_lixels$nkde_simple <- bkk_nkde_simple*nrow(bkk_acc)*1000

bkk_samples$nkde_discont <- bkk_nkde_discon*nrow(bkk_acc)*1000
bkk_lixels$nkde_discont <- bkk_nkde_discon*nrow(bkk_acc)*1000

bkk_samples$nkde_cont <- bkk_nkde_cont*nrow(bkk_acc)*1000
bkk_lixels$nkde_cont <- bkk_nkde_cont*nrow(bkk_acc)*1000
# Samut Prakan
spk_samples$nkde_simple <- spk_nkde_simple*nrow(spk_acc)*1000
spk_lixels$nkde_simple <- spk_nkde_simple*nrow(spk_acc)*1000

spk_samples$nkde_discont <- spk_nkde_discon*nrow(spk_acc)*1000
spk_lixels$nkde_discont <- spk_nkde_discon*nrow(spk_acc)*1000

spk_samples$nkde_cont <- spk_nkde_cont*nrow(spk_acc)*1000
spk_lixels$nkde_cont <- spk_nkde_cont*nrow(spk_acc)*1000
#| code-fold: true
#| code-summary: "Show the code"

tmap_mode('view')
bkk_nkde_simple_map <- tm_basemap("OpenStreetMap") +
  tm_shape(bkk_lixels)+
  tm_lines(col="nkde_simple", 
           lwd = 2, 
           palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e"),
           id="nkde_cont")+
  tm_shape(bkk_acc)+
  tm_dots(size=0.01)

bkk_nkde_discon_map <- tm_basemap("OpenStreetMap") +
tm_shape(bkk_lixels)+
  tm_lines(col="nkde_discont", 
           lwd = 2, 
           palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e"),
           id="nkde_cont")+
  tm_shape(bkk_acc)+
  tm_dots(size=0.01) 

bkk_nkde_cont_map <- tm_basemap("OpenStreetMap") +
tm_shape(bkk_lixels)+
  tm_lines(col="nkde_cont", 
           lwd = 2, 
           palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e"),
           id="nkde_cont")+
  tm_shape(bkk_acc)+
  tm_dots(size=0.01)


tmap_arrange(bkk_nkde_simple_map, 
             bkk_nkde_discon_map, 
             bkk_nkde_cont_map,
             ncol=3,
             nrow=1,
             sync = TRUE
             )
#| code-fold: true
#| code-summary: "Show the code"

tmap_mode('view')
spk_nkde_simple_map <- tm_basemap(server = "OpenStreetMap") +
  tm_shape(spk_lixels)+
  tm_lines(col="nkde_simple", 
           lwd = 2, 
           palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e"), 
           id="nkde_cont")+
  tm_shape(spk_acc)+
  tm_dots(size=0.01) 

spk_nkde_discon_map <- tm_basemap(server = "OpenStreetMap") +
tm_shape(spk_lixels)+
  tm_lines(col="nkde_discont", 
           lwd = 2, 
           palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e"), 
           id="nkde_cont")+
  tm_shape(spk_acc)+
  tm_dots(size=0.01)

spk_nkde_cont_map <- tm_basemap(server = "OpenStreetMap") +
tm_shape(spk_lixels)+
  tm_lines(col="nkde_cont", 
           lwd = 2, 
           palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e"), 
           id="nkde_cont")+
  tm_shape(spk_acc)+
  tm_dots(size=0.01)

tmap_arrange(spk_nkde_simple_map, 
             spk_nkde_discon_map, 
             spk_nkde_cont_map,
             ncol=3,
             nrow=1,
             sync = TRUE
             )

7 Temporal Network Kernel Density Estination (TNKDE)

Temporal Network Kernel Density Estimation (TNKDE) is an advanced spatial analysis technique that integrates both spatial and temporal dimensions to provide a more nuanced understanding of event distributions along networks. Unlike traditional NKDE, which focuses solely on spatial data, TNKDE accounts for time-based variations such as time of day, day of the week, or even seasonal trends. This allows for a more detailed analysis of how events, like traffic accidents, fluctuate over both space and time.

The purpose of TNKDE is to capture temporal patterns within spatial networks, offering insights into how event intensities shift at different times. By incorporating time, TNKDE helps uncover spatio-temporal relationships that may be overlooked in a purely spatial analysis. This makes it especially useful for understanding dynamic events, like traffic accidents, where timing plays a critical role. Ultimately, TNKDE supports more informed decision-making for urban planning, road safety improvements, and traffic management by identifying accident hotspots based on specific temporal contexts.

7.1 Visualising Distribution

Recall the variables we derived in Section 3.1. Our TNKDE analysis will be focused on these factors:

flowchart TD
    A[Variables] --> A1[Month] 
    A1 -.-> A11[Jan to Dec]
    A --> A2[Day]
    A2 -.-> A21[Mon to Sun]
    A --> A3[Hour]
    A3 -.-> A31[00 to 23 hr]
    A --> A4[Songkran?]
    A4 -.-> A41[Yes]
    A4 -.-> A42[No]
    A --> A5[Weektype]
    A5 -.-> A51[Weekday]
    A5 -.-> A52[Weekend]
    A --> A6[Season]
    A6 -.-> A61[Summer]
    A6 -.-> A62[Rainy]
    A6 -.-> A63[Winter]
    A --> A7[Peak Period]
    A7 -.-> A71[Morning Peak] 
    A7 -.-> A72[Evening Peak]    
    A7 -.-> A73[Non-peak]    
    

Let’s visualise the distribution:

Show the code
plot_month <- ggplot(data = bkk_acc,
                        aes(x = month)) +
  geom_bar()+
  ylim(0, 800) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=2) +
  labs(x = "Month",
         y = "No. of Accidents",
         title = "Accidents by Month") +
  scale_x_continuous(breaks = 1:12) +
    theme_minimal() +
    theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_blank(),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) 

plot_day <- ggplot(data = bkk_acc,
                        aes(x = day)) +
  geom_bar()+
  ylim(0, 1200) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=2) +
  labs(x = "Month",
         y = "No. of Accidents",
         title = "Accidents by Day of Week") + 
  scale_x_continuous(breaks = 1:7) +
  theme_minimal() +
  theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_blank(),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) 

plot_hour <- ggplot(data = bkk_acc,
                        aes(x = hour)) +
  geom_bar()+
  ylim(0, 700) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=2,
            angle = 45,
            hjust = -0.25) +
    labs(x = "Month",
         y = "No. of Accidents",
         title = "Accidents by Time of Day") + 
    scale_x_continuous(breaks = 0:23) +
    theme_minimal() +
    theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_blank(),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) 

gridExtra::grid.arrange(plot_month, plot_day,
                        plot_hour, 
                        ncol=1, nrow = 3)

Show the code
plot_songkran <- ggplot(data = bkk_acc,
                        aes(x = reorder(songkran, songkran, function(x)-length(x)))) +
  geom_bar()+
  ylim(0, 13500) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=3) +
    labs(x = "Songkran?",
         y = "No. of Accidents",
         title = "Accidents During Songkran") + 
    theme_minimal() +
    theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_text(size = 6),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) +
  scale_x_discrete(labels=c('No', 'Yes'))

plot_weektype <- ggplot(data = bkk_acc,
                        aes(x = reorder(weektype, weektype, function(x)-length(x)))) +
  geom_bar()+
  ylim(0, 6000) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=3) +
    labs(x = "Weektype",
         y = "No. of Accidents",
         title = "Accidents by Weektype") + 
    theme_minimal() +
    theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_blank(),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) 

plot_season <- ggplot(data = bkk_acc,
                        aes(x = reorder(season, season, function(x)-length(x)))) +
  geom_bar()+
  ylim(0, 6000) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=3) +
    labs(x = "Season",
         y = "No. of Accidents",
         title = "Accidents by Season") + 
    theme_minimal() +
    theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_text(size = 6),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) 

plot_peakperiod <- ggplot(data = bkk_acc,
                        aes(x = reorder(peakperiod, peakperiod, function(x)-length(x)))) +
  geom_bar()+
  ylim(0, 6000) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=3) +
    labs(x = "Season",
         y = "No. of Accidents",
         title = "Accidents by Peak/Non-Peak Period") + 
    theme_minimal() +
    theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_blank(),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) 

gridExtra::grid.arrange(plot_songkran, plot_weektype,
                        plot_season, plot_peakperiod,
                        ncol=2, nrow = 2)

Show the code
plot_month_spk <- ggplot(data = spk_acc,
                        aes(x = month)) +
  geom_bar()+
  ylim(0, 300) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=2) +
  labs(x = "Month",
         y = "No. of Accidents",
         title = "Accidents by Month") +
  scale_x_continuous(breaks = 1:12) +
    theme_minimal() +
    theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_blank(),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) 

plot_day_spk <- ggplot(data = spk_acc,
                        aes(x = day)) +
  geom_bar()+
  ylim(0, 500) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=2) +
  labs(x = "Month",
         y = "No. of Accidents",
         title = "Accidents by Day of Week") + 
  scale_x_continuous(breaks = 1:7) +
  theme_minimal() +
  theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_blank(),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) 

plot_hour_spk <- ggplot(data = spk_acc,
                        aes(x = hour)) +
  geom_bar()+
  ylim(0, 300) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=2,
            angle = 45,
            hjust = -0.25) +
    labs(x = "Month",
         y = "No. of Accidents",
         title = "Accidents by Time of Day") + 
    scale_x_continuous(breaks = 0:23) +
    theme_minimal() +
    theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_blank(),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) 

gridExtra::grid.arrange(plot_month_spk, plot_day_spk,
                        plot_hour_spk, 
                        ncol=1, nrow = 3)

Show the code
plot_songkran_spk <- ggplot(data = spk_acc,
                        aes(x = reorder(songkran, songkran, function(x)-length(x)))) +
  geom_bar()+
  ylim(0, 3000) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=3) +
    labs(x = "Songkran?",
         y = "No. of Accidents",
         title = "Accidents During Songkran") + 
    theme_minimal() +
    theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_text(size = 6),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) +
  scale_x_discrete(labels=c('No', 'Yes'))

plot_weektype_spk <- ggplot(data = spk_acc,
                        aes(x = reorder(weektype, weektype, function(x)-length(x)))) +
  geom_bar()+
  ylim(0, 2000) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=3) +
    labs(x = "Weektype",
         y = "No. of Accidents",
         title = "Accidents by Weektype") + 
    theme_minimal() +
    theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_blank(),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) 

plot_season_spk <- ggplot(data = spk_acc,
                        aes(x = reorder(season, season, function(x)-length(x)))) +
  geom_bar()+
  ylim(0, 1500) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=3) +
    labs(x = "Season",
         y = "No. of Accidents",
         title = "Accidents by Season") + 
    theme_minimal() +
    theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_text(size = 6),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) 

plot_peakperiod_spk <- ggplot(data = spk_acc,
                        aes(x = reorder(peakperiod, peakperiod, function(x)-length(x)))) +
  geom_bar()+
  ylim(0, 2500) + 
  geom_text(stat="count", 
            aes(label=paste0(after_stat(count), ", ",
                             round(after_stat(count)/sum(after_stat(count))*100, 1), "%")),
            vjust=-1,
            size=3) +
    labs(x = "Season",
         y = "No. of Accidents",
         title = "Accidents by Peak/Non-Peak Period") + 
    theme_minimal() +
    theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title.y = element_blank(),
      axis.title.x = element_blank(),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9")
    ) 

gridExtra::grid.arrange(plot_songkran_spk, plot_weektype_spk,
                        plot_season_spk, plot_peakperiod_spk,
                        ncol=2, nrow = 2)

7.2 Temporal Dimension

Next, we will compute kernel density estimates of accidents over time using multiple bandwidths with the tnkde() function from the spNetwork package. TNKDE requires two bandwidths: one for spatial data and one for temporal data. In this section, we will explore three different methods for selecting bandwidths:

  • bw_bcv: Utilises biased cross-validation to find the bandwidth that minimises estimation errors.
  • bw_ucv: Implements unbiased cross-validation for bandwidth selection.
  • bw_SJ: Estimates bandwidths using pilot estimation of derivatives.

Steps:

  1. We will start by creating a vector called weight that will have the same number of elements as rows in the bkk_acc dataset, which will be used as the weighting factor in the TNKDE calculations.
  2. After that, we’ll generate a sequence of numbers ranging from 0 to the maximum value in the hour column of the bkk_acc data frame, with intervals of 0.5. These points will serve as the sample locations for estimating the density.
  3. Next, we’ll calculate the bandwidth values for the hour column using the three bandwidth selection methods mentioned earlier.
  4. Once the bandwidth values are obtained, we will perform the TNKDE analysis using the tkde() function from the spPackage and then create a data frame called tk to store the resulting density estimates.
  5. Finally, we will apply the melt function from the reshape2 package to reshape the tk data frame into a new data frame named df_hour, using the hour variable as the identifier.
## Bangkok
weight_bkk <- rep(1, nrow(bkk_acc))
samples_bkk <- seq(0, max(bkk_acc$hour), 0.5)

## Sanut Prakan
weight_spk <- rep(1, nrow(spk_acc))
samples_spk <- seq(0, max(spk_acc$hour), 0.5)
## Bangkok
tk_bkk <- data.frame(
  bw_bcv = tkde(bkk_acc$hour, w = weight_bkk, samples = samples_bkk, bw = bw_bcv_bkk, kernel_name = "quartic"),
  bw_ucv = tkde(bkk_acc$hour, w = weight_bkk, samples = samples_bkk, bw = bw_ucv_bkk, kernel_name = "quartic"),
  bw_SJ = tkde(bkk_acc$hour, w = weight_bkk, samples = samples_bkk, bw = bw_SJ_bkk, kernel_name = "quartic"),
  hour = samples_bkk
)

df_hour_bkk <- reshape2::melt(tk_bkk,
                          id.vars = "hour")

## Sanut Prakan
tk_spk <- data.frame(
  bw_bcv = tkde(spk_acc$hour, w = weight_spk, samples = samples_spk, bw = bw_bcv_spk, kernel_name = "quartic"),
  bw_ucv = tkde(spk_acc$hour, w = weight_spk, samples = samples_spk, bw = bw_ucv_spk, kernel_name = "quartic"),
  bw_SJ = tkde(spk_acc$hour, w = weight_spk, samples = samples_spk, bw = bw_SJ_spk, kernel_name = "quartic"),
  hour = samples_spk
)

df_hour_spk <- reshape2::melt(tk_spk,
                          id.vars = "hour")
Show the code
plot_tk_bkk <- ggplot(data = df_hour_bkk,
                  aes(x = hour, y = value)) +
  geom_line()+
  theme_minimal() +
  theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title = element_text(size = 8),
      axis.text = element_text(size = 6),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9"),
      strip.background=element_rect(fill="#f4e9e8")
    ) +
  facet_wrap(vars(variable), ncol=1, nrow=3, scales = "free") +
  scale_x_continuous(breaks = 0:23)

plot_tk_bkk

Show the code
plot_tk_spk <- ggplot(data = df_hour_spk,
                  aes(x = hour, y = value)) +
  geom_line()+
  theme_minimal() +
  theme(
      plot.title = element_text(face="bold", size = 8),
      axis.title = element_text(size = 8),
      axis.text = element_text(size = 6),
      plot.background=element_rect(fill="#E4D5C9",colour="#E4D5C9"),
      strip.background=element_rect(fill="#f4e9e8")
    ) +
  facet_wrap(vars(variable), ncol=1, nrow=3, scales = "free") +
  scale_x_continuous(breaks = 0:23)

plot_tk_spk

7.3 Optimal Bandwidths

For effective implementation of TNKDE, it’s essential to consider both spatial and temporal dimensions. To achieve this, we can use the leave-one-out cross-validation method to determine appropriate bandwidths for both aspects. The bw_tnkde_cv_likelihood_calc() function from spNetwork can be employed to evaluate the cross-validation likelihood for various bandwidths, allowing us to select the most fitting options for both the network and time dimensions based on data-driven insights.

In the code chunk below, the following arguments are used: - bws_net: An ordered numeric vector with all the network bandwidths consisting of range (starting, ending value), and step size. - bws_time: An ordered numeric vector with all the time bandwidths with range (starting, ending value), and step size. - lines: linestrings representing the underlying network - events: points representing the events on the network i.e. our points of accidents - timefield: this is the hour column derived in our earlier calculation - agg, max_depth, grid_shape are kept consistent from our NKDE calculations.

cv_bkk <- bw_tnkde_cv_likelihood_calc(
  bws_net = seq(100,1000,100),
  bws_time = seq(10,60,5),
  lines = bkk_network,
  events = bkk_acc,
  time_field = "hour",
  w = rep(1, nrow(bkk_acc)),
  kernel_name = "quartic",
  method = "discontinuous", 
  diggle_correction = FALSE,
  study_area = NULL,
  max_depth = 8,
  digits = 1,
  tol = 0.1,
  agg = 5,
  sparse=TRUE,
  grid_shape=c(10,10),
  sub_sample=1,
  verbose = FALSE,
  check = TRUE)

write_rds(cv_bkk, "data/rds/cv_bkk.rds")

To visualise the output:

10 15 20 25 30 35 40 45 50 55 60
100 -181.34325 -150.86581 -139.77335 -136.38724 -136.61409 -136.82656 -137.02051 -137.19712 -137.35844 -137.50654 -137.64319
200 -115.58330 -96.21737 -88.81100 -87.26863 -87.52025 -87.75267 -87.96406 -88.15628 -88.33176 -88.49280 -88.64138
300 -88.22241 -72.98540 -68.66435 -66.12972 -66.39108 -66.63200 -66.85092 -67.04990 -67.23152 -67.39818 -67.55193
400 -70.82133 -60.13231 -57.07147 -55.44705 -55.71491 -55.96074 -56.18380 -56.38641 -56.57128 -56.74089 -56.89733
500 -61.29754 -52.99925 -50.85048 -49.45982 -49.73170 -49.98047 -50.20595 -50.41066 -50.59740 -50.76870 -50.92669
600 -54.68852 -48.21310 -46.51937 -45.35830 -45.63296 -45.88376 -46.11092 -46.31710 -46.50514 -46.67761 -46.83667
700 -49.86932 -43.63428 -42.73554 -41.69092 -41.96761 -42.21998 -42.44851 -42.65589 -42.84502 -43.01849 -43.17847
800 -47.28646 -40.84318 -39.83741 -38.90791 -39.18626 -39.43990 -39.66949 -39.87782 -40.06781 -40.24205 -40.40275
900 -44.47935 -39.27202 -38.49220 -37.45284 -37.73203 -37.98631 -38.21645 -38.42526 -38.61568 -38.79033 -38.95140
1000 -42.55712 -37.69901 -37.14587 -36.22036 -36.50042 -36.75529 -36.98592 -37.19517 -37.38598 -37.56098 -37.72237
cv_spk <- bw_tnkde_cv_likelihood_calc(
  bws_net = seq(100,1000,100),
  bws_time = seq(10,60,5),
  lines = spk_network,
  events = spk_acc,
  time_field = "hour",
  w = rep(1, nrow(spk_acc)),
  kernel_name = "quartic",
  method = "discontinuous", 
  diggle_correction = FALSE,
  study_area = NULL,
  max_depth = 8,
  digits = 1,
  tol = 0.1,
  agg = 5,
  sparse=TRUE,
  grid_shape=c(10,10),
  sub_sample=1,
  verbose = FALSE,
  check = TRUE)

write_rds(cv_spk, "data/rds/cv_spk.rds")

To visualise the output:

10 15 20 25 30 35 40 45 50 55 60
100 -237.91197 -210.65241 -197.30990 -194.75027 -194.95486 -195.14583 -195.32013 -195.47882 -195.62376 -195.75681 -195.87957
200 -183.65331 -156.26772 -145.46229 -142.01147 -142.23908 -142.45049 -142.64311 -142.81837 -142.97841 -143.12530 -143.26082
300 -150.72171 -124.94252 -115.39506 -113.80690 -114.04852 -114.27148 -114.47425 -114.65859 -114.82686 -114.98127 -115.12371
400 -123.28002 -100.61008 -93.84198 -92.57890 -92.83060 -93.06205 -93.27233 -93.46342 -93.63783 -93.79786 -93.94547
500 -107.83014 -87.33140 -80.58086 -79.63046 -79.88865 -80.12542 -80.34035 -80.53562 -80.71380 -80.87728 -81.02808
600 -92.92595 -73.98646 -67.55693 -67.23058 -67.49554 -67.73765 -67.95717 -68.15651 -68.33837 -68.50520 -68.65908
700 -83.85252 -68.59219 -62.78504 -62.46360 -62.73159 -62.97590 -63.19726 -63.39821 -63.58151 -63.74965 -63.90472
800 -78.14438 -64.74374 -59.55715 -59.23880 -59.50865 -59.75436 -59.97690 -60.17890 -60.36315 -60.53215 -60.68802
900 -75.49196 -62.40967 -57.22976 -56.91380 -57.18506 -57.43182 -57.65525 -57.85803 -58.04299 -58.21263 -58.36909
1000 -67.95707 -56.42548 -50.64316 -50.02889 -50.30212 -50.55113 -50.77675 -50.98157 -51.16843 -51.33984 -51.49794

7.4 Calculating TNKDE

Using the optimal set of spatial and temporal bandwidths, we can perform TNKDE calculation.

We first create a sequence of hourly time points from our dataset. min() retrieves the minimum value while max() retrieves the maximum value from the hour column. seq() generates a sequence of numbers from the minimum hour and ending at the maximum hour, with an increment of 1 (i.e. giving us hourly intervals). This sequence will be used to visualise accident data at different times of the day.

bkk_sample_time <- seq(min(bkk_acc$hour), max(bkk_acc$hour), 1)
bkk_tnkde_densities <- tnkde(lines = bkk_network,
                   events = bkk_acc,
                   time_field = "hour",
                   w = rep(1, nrow(bkk_acc)), 
                   samples_loc = bkk_samples,
                   samples_time = bkk_sample_time, 
                   kernel_name = "quartic",
                   bw_net = 1000, bw_time = 25,
                   adaptive = TRUE,
                   trim_bw_net = 900,
                   trim_bw_time = 8,
                   method = "discontinuous",
                   div = "bw", max_depth = 8,
                   digits = 1, tol = 0.01,
                   agg = 5, grid_shape = c(10,10), 
                   verbose  = FALSE)
spk_sample_time <- seq(min(spk_acc$hour), max(spk_acc$hour), 1)
spk_tnkde_densities <- tnkde(lines = spk_network,
                   events = spk_acc,
                   time_field = "hour",
                   w = rep(1, nrow(spk_acc)), 
                   samples_loc = spk_samples,
                   samples_time = spk_sample_time, 
                   kernel_name = "quartic",
                   bw_net = 1000, bw_time = 25,
                   adaptive = TRUE,
                   trim_bw_net = 900,
                   trim_bw_time = 8,
                   method = "discontinuous",
                   div = "bw", max_depth = 8,
                   digits = 1, tol = 0.01,
                   agg = 5, grid_shape = c(10,10), 
                   verbose  = FALSE)

7.5 Visualising TNKDE

7.5.1 Bangkok

Show the code
# Create a color palette for all the densities
all_densities_bkk <- c(bkk_tnkde_densities$k)
color_breaks_bkk <- classInt::classIntervals(all_densities_bkk, 
                                             n = 10, 
                                             style = "kmeans")

# Generate a map at each sample time
maps_bkk <- lapply(1:ncol(bkk_tnkde_densities$k), function(i){
  time <- bkk_sample_time[[i]]
  
  bkk_samples$tnkde_density <- bkk_tnkde_densities$k[,i]
  
  map1 <- tm_shape(bkk_samples) + 
  tm_dots(col = "tnkde_density", size = 0.01,
          breaks = color_breaks_bkk$brks, 
          palette = viridis::viridis(10)) + 
    tm_layout(legend.show=FALSE, 
              main.title = paste("TNKDE of Accidents in Bangkok - ", as.character(time), ":00"), 
              main.title.size = 0.5,
              bg.color = "#E4D5C9",
              frame = F)
  return(map1)
})

# Create a gif with all the maps
tmap_animation(maps_bkk, filename = "images/tnkde_bkk.gif", 
               width = 1000, height = 1000, dpi = 300, delay = 50)

Although hotspot locations shift on an hourly basis, the hourly maps visually smooth out these changes between hours. In the animation below, we applied the same process but with a 3-hour interval, allowing for clearer visualisation of hotspot variations over time.

7.5.2 Samut Prakan

Show the code
# Create a color palette for all the densities
all_densities_spk <- c(spk_tnkde_densities$k)
color_breaks_spk <- classInt::classIntervals(all_densities_spk, n = 10, style = "kmeans")

# Generate a map at each sample time
maps_spk <- lapply(1:ncol(spk_tnkde_densities$k), function(i){
  time <- spk_sample_time[[i]]
  
  spk_samples$tnkde_density <- spk_tnkde_densities$k[,i]
  map1 <- tm_shape(spk_samples) + 
  tm_dots(col = "tnkde_density", size = 0.01,
          breaks = color_breaks_spk$brks, palette = viridis::viridis(10)) + 
    tm_layout(legend.show=FALSE, 
              main.title = paste("TNKDE of Accidents in Samut Prakan - ", as.character(time), ":00"), 
              main.title.size = 0.5,
              bg.color = "#E4D5C9",
              frame = F)
  return(map1)
})

# Create a gif with all the maps
tmap_animation(maps_spk, filename = "images/tnkde_spk.gif", 
               width = 1000, height = 1000, dpi = 300, delay = 50)

Using 3-hour intervals:

8 Conclusion

This study examined different methods of spatial point pattern analysis using road accident data from Thailand. We utilised three distinct approaches: Traditional Kernel Density Estimation (KDE), Network-Constrained Kernel Density Estimation (NKDE), and Temporal Network Kernel Density Estimation (TNKDE). These techniques helped uncover both spatial and temporal trends in accident clustering within the Bangkok Metropolitan Region (BMR).

We experimented with various bandwidth selection techniques and kernel functions, comparing the outcomes for each combination. We produced three KDE maps using different strategies: fixed bandwidth, adaptive nearest neighbor, and adaptive kernel density. These maps pinpointed potential accident hotspots in two provinces within the BMR, namely Bangkok and Samut Prakan, where accidents showed significant spatial clustering.

A key part of our analysis involved comparing the performance of traditional, pixel-based KDE with network-based NKDE. Traditional KDE was limited in its ability to capture the spatial distribution of accidents along road networks due to its reliance on Euclidean distance. NKDE, on the other hand, offered a more precise and accurate representation by accounting for the constraints of road networks. This approach led to a more nuanced understanding of accident hotspots, providing clearer and more reliable results than the traditional method.

The introduction of TNKDE added further depth to the analysis by incorporating time as a factor, allowing us to observe how accident clusters vary over different time intervals. This provided a more dynamic perspective on the spatial distribution of accidents.

Looking ahead, future research could incorporate additional data from the original Thailand Road Accidents dataset, such as weather conditions, road type, and traffic characteristics, to provide a more holistic view of the factors contributing to road accidents. Integrating these variables could enhance the analysis and offer a richer understanding of accident patterns.

9 References

Gelb, J. (2021). spNetwork: A Package for Network Kernel Density Estimation. The R Journal, 13(2). https://journal.r-project.org/archive/2021/RJ-2021-102/RJ-2021-102.pdf

Gelb, J. (2024). Temporal Network Kernel Density Estimate. spNetwork. https://jeremygelb.github.io/spNetwork/articles/TNKDE.html#spatio-temporal

Gelb, J. (2024). Network Kernel Density Estimate. spNetwork. https://jeremygelb.github.io/spNetwork/articles/NKDE.html

Kam, T.S. (2023). 1st Order Spatial Point Patterns Analysis Methods. R for Geospatial Data Science and Analytics. https://r4gdsa.netlify.app/chap04

Kam, T.S. (2023). 2nd Order Spatial Point Patterns Analysis Methods. R for Geospatial Data Science and Analytics. https://r4gdsa.netlify.app/chap05

Kam, T.S. (2023). Spatio-Temporal Point Patterns Analysis. R for Geospatial Data Science and Analytics. https://r4gdsa.netlify.app/chap06

Kam, T.S. (2023). Network Constrained Spatial Point Patterns Analysis. R for Geospatial Data Science and Analytics. https://r4gdsa.netlify.app/chap07